Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Distributed deduplication storage system based on Hadoop platform

LIU Qing, FU Yinjin, NI Guiqiang, MEI Jianmin

Journal of Computer Applications 2016, 36 (2): 330-335. DOI: 10.11772/j.issn.1001-9081.2016.02.0330

Abstract （865）

PDF （985KB）（1310）

Save

Focusing on the issues that there is a lot of data redundancy in data center, especially the backup data has caused a tremendous waste on storage space, a deduplication prototype based on Hadoop platform was proposed. Deduplication technology which detects and eliminates redundant data in a particular data set can greatly reduce the data storage capacity and optimize the utilization of storage space. Using the two big data management tools——Hadoop Distributed File System (HDFS) and non-relational database HBase, a scalable and distributed deduplication storage system was designed and implemented. In this system, the MapReduce parallel programming framework was responsible for parallel deduplication, and HDFS was responsible for data storage after deduplication. The index table was stored in HBase for efficient chunk fingerprint indexing. The system was also tested with virtual machine image file sets. The results demonstrate that the Hadoop based distributed deduplication system can ensure high throughput and excellent scalability as well as guaranting high deduplication rate.

Reference | Related Articles | Metrics